The overall goal of this presentation is to show how housing prices varies according to location and proximity to the ocean. The main features are housing_median_age (Years), median_income (USD), median_house_value (USD) and ocean_proximity.
The data analyzed is the California housing price dataset downloaded from kaggle. The dataset contains 20,640 observations and 10 features. The features are listed below:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import plotly.express as px
%matplotlib inline
# suppress warnings from final output
import warnings
warnings.simplefilter("ignore")
# load in the dataset into a pandas dataframe
df = pd.read_csv('housing.csv')
df.sample(5)
| longitude | latitude | housing_median_age | total_rooms | total_bedrooms | population | households | median_income | median_house_value | ocean_proximity | |
|---|---|---|---|---|---|---|---|---|---|---|
| 12032 | -117.47 | 33.93 | 33.0 | 919.0 | 208.0 | 724.0 | 235.0 | 3.4028 | 110500.0 | INLAND |
| 10718 | -117.83 | 33.65 | 8.0 | 2149.0 | 426.0 | 950.0 | 399.0 | 4.1103 | 250400.0 | <1H OCEAN |
| 2494 | -120.19 | 36.60 | 25.0 | 875.0 | 214.0 | 931.0 | 214.0 | 1.5536 | 58300.0 | INLAND |
| 18368 | -121.98 | 37.16 | 42.0 | 2533.0 | 433.0 | 957.0 | 398.0 | 5.3468 | 279900.0 | <1H OCEAN |
| 9973 | -122.40 | 38.53 | 24.0 | 1741.0 | 289.0 | 564.0 | 231.0 | 3.6118 | 248400.0 | INLAND |
# Drop null values
df.dropna(axis=0, inplace=True)
# Change the datatype of some features from `float` to `int`
obs = ['housing_median_age', 'total_rooms', 'total_bedrooms', 'population', 'households']
for v in obs:
df[v] = df[v].astype('int')
There is a positive correlation between households income and house value as shown in the plot below:
# Scatter plot of house value and income
sb.scatterplot(data=df, x='median_income', y='median_house_value')
plt.xlabel('Income [Thousand USD]')
plt.ylabel('House Value [USD]')
plt.title('Income vs House Value');
The age of the House does not have any impact on the value placed on the house.
# Scatter plot of house age and house value
sb.scatterplot(data=df, x='housing_median_age', y='median_house_value')
plt.xlabel('Housing Age [Years]')
plt.ylabel('House Value [USD]')
plt.title('Housing Age vs House Value');
The location of the houses have impact on the value of the house. The closer they are to the Waters, the higher the value
fig = px.scatter_mapbox(df,
lat='latitude',
lon='longitude',
center={'lat':37.09, 'lon':-121},
height=600,
width=600,
color='median_house_value',
hover_data=['ocean_proximity'])
fig.update_layout(mapbox_style='open-street-map', title='Housing Price and Location')
fig.show()
Generate Slideshow: Once you're ready to generate your slideshow, use the
jupyter nbconvertcommand to generate the HTML slide show. . From the terminal or command line, use the following expression.
!jupyter nbconvert Part_II_slide_deck_template.ipynb --to slides --post serve --no-input --no-prompt
This should open a tab in your web browser where you can scroll through your presentation. Sub-slides can be accessed by pressing 'down' when viewing its parent slide. Make sure you remove all of the quote-formatted guide notes like this one before you finish your presentation! At last, you can stop the Kernel.